Apprentissage et optimisation de politiques pour un bras articulé actionné par des muscles
Identifieur interne : 001693 ( Main/Exploration ); précédent : 001692; suivant : 001694Apprentissage et optimisation de politiques pour un bras articulé actionné par des muscles
Auteurs : Didier Marin [France] ; Lionel Rigoux [France] ; Olivier Sigaud [France]Source :
- Revue d'intelligence artificielle [ 0992-499X ] ; 2013.
Descripteurs français
- Pascal (Inist)
- Wicri :
English descriptors
- KwdEn :
Abstract
Many research works combine learning from demonstration and policy improvement methods to learn the controller of a robot along a specific trajectory. Nevertheless, a capability to learn in the whole reachable space of this robot is missing in these works. In this paper we propose a method that consists in learning a reactive near-optimal feedback controller in two steps. First, an efficient parametric feedback controller is obtained from learning from Demonstration based on the trajectories computed by a costly near-optimal controller. Second, the feedback controller is optimized further with direct Policy Search methods. As a result, we obtain a controller that is executed 20 000 times faster than the original controller for a similar performance. Our work is evaluated in simulation.
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000062
- to stream PascalFrancis, to step Curation: 000945
- to stream PascalFrancis, to step Checkpoint: 000038
- to stream Main, to step Merge: 001709
- to stream Main, to step Curation: 001693
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="fr" level="a">Apprentissage et optimisation de politiques pour un bras articulé actionné par des muscles</title>
<author><name sortKey="Marin, Didier" sort="Marin, Didier" uniqKey="Marin D" first="Didier" last="Marin">Didier Marin</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Institut des Systèmes lntelligents et de Robotique UPMC-Paris 6, CNRS UMR 7222 4 place Jussieu</s1>
<s2>75252 Paris</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Rigoux, Lionel" sort="Rigoux, Lionel" uniqKey="Rigoux L" first="Lionel" last="Rigoux">Lionel Rigoux</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Institut des Systèmes lntelligents et de Robotique UPMC-Paris 6, CNRS UMR 7222 4 place Jussieu</s1>
<s2>75252 Paris</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Sigaud, Olivier" sort="Sigaud, Olivier" uniqKey="Sigaud O" first="Olivier" last="Sigaud">Olivier Sigaud</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Institut des Systèmes lntelligents et de Robotique UPMC-Paris 6, CNRS UMR 7222 4 place Jussieu</s1>
<s2>75252 Paris</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">13-0216766</idno>
<date when="2013">2013</date>
<idno type="stanalyst">PASCAL 13-0216766 INIST</idno>
<idno type="RBID">Pascal:13-0216766</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000062</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000945</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000038</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000038</idno>
<idno type="wicri:doubleKey">0992-499X:2013:Marin D:apprentissage:et:optimisation</idno>
<idno type="wicri:Area/Main/Merge">001709</idno>
<idno type="wicri:Area/Main/Curation">001693</idno>
<idno type="wicri:Area/Main/Exploration">001693</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="fr" level="a">Apprentissage et optimisation de politiques pour un bras articulé actionné par des muscles</title>
<author><name sortKey="Marin, Didier" sort="Marin, Didier" uniqKey="Marin D" first="Didier" last="Marin">Didier Marin</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Institut des Systèmes lntelligents et de Robotique UPMC-Paris 6, CNRS UMR 7222 4 place Jussieu</s1>
<s2>75252 Paris</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Rigoux, Lionel" sort="Rigoux, Lionel" uniqKey="Rigoux L" first="Lionel" last="Rigoux">Lionel Rigoux</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Institut des Systèmes lntelligents et de Robotique UPMC-Paris 6, CNRS UMR 7222 4 place Jussieu</s1>
<s2>75252 Paris</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Sigaud, Olivier" sort="Sigaud, Olivier" uniqKey="Sigaud O" first="Olivier" last="Sigaud">Olivier Sigaud</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Institut des Systèmes lntelligents et de Robotique UPMC-Paris 6, CNRS UMR 7222 4 place Jussieu</s1>
<s2>75252 Paris</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
<imprint><date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Arm</term>
<term>Capability index</term>
<term>Direct method</term>
<term>Entropy</term>
<term>Feedback regulation</term>
<term>Muscle</term>
<term>Optimal control</term>
<term>Optimal control (mathematics)</term>
<term>Optimization</term>
<term>Policy</term>
<term>Robotics</term>
<term>Search algorithm</term>
<term>Space application</term>
<term>Stochastic control</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Robotique</term>
<term>Rétroaction</term>
<term>Politique</term>
<term>Bras</term>
<term>Muscle</term>
<term>Indice aptitude</term>
<term>Commande optimale</term>
<term>Optimisation</term>
<term>Application spatiale</term>
<term>Méthode directe</term>
<term>Algorithme recherche</term>
<term>Commande stochastique</term>
<term>Entropie</term>
<term>Contrôle optimal</term>
<term>.</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Robotique</term>
<term>Politique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Many research works combine learning from demonstration and policy improvement methods to learn the controller of a robot along a specific trajectory. Nevertheless, a capability to learn in the whole reachable space of this robot is missing in these works. In this paper we propose a method that consists in learning a reactive near-optimal feedback controller in two steps. First, an efficient parametric feedback controller is obtained from learning from Demonstration based on the trajectories computed by a costly near-optimal controller. Second, the feedback controller is optimized further with direct Policy Search methods. As a result, we obtain a controller that is executed 20 000 times faster than the original controller for a similar performance. Our work is evaluated in simulation.</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
</country>
<region><li>Île-de-France</li>
</region>
<settlement><li>Paris</li>
</settlement>
</list>
<tree><country name="France"><region name="Île-de-France"><name sortKey="Marin, Didier" sort="Marin, Didier" uniqKey="Marin D" first="Didier" last="Marin">Didier Marin</name>
</region>
<name sortKey="Rigoux, Lionel" sort="Rigoux, Lionel" uniqKey="Rigoux L" first="Lionel" last="Rigoux">Lionel Rigoux</name>
<name sortKey="Sigaud, Olivier" sort="Sigaud, Olivier" uniqKey="Sigaud O" first="Olivier" last="Sigaud">Olivier Sigaud</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001693 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001693 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Lorraine |area= InforLorV4 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:13-0216766 |texte= Apprentissage et optimisation de politiques pour un bras articulé actionné par des muscles }}
This area was generated with Dilib version V0.6.33. |